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Abstract 

The simulation of deterministic pushdown automata defined over a one-letter al- 
phabet by finite state automata is investigated from a descriptional complexity point 
of view. We show that each unary deterministic pushdown automaton of size s can 
be simulated by a deterministic finite automaton with a number of states that is ex- 
ponential in s. We prove that this simulation is tight. Furthermore, its cost cannot 
C/3 , be reduced even if it is performed by a two-way nondeterministic automaton. We also 

prove that there are unary languages for which deterministic pushdown automata can- 
not be exponentially more succinct than finite automata. In order to state this result, 
we investigate the conversion of deterministic pushdown automata into context-free 
grammars. We prove that in the unary case the number of variables in the result- 
ing grammar is strictly smaller than the number of variables needed in the case of 
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, nonunary alphabets. 
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scriptional complexity. 



. . 1 Introduction 

X 

5t , Deterministic context-free languages and their corresponding devices, deterministic push- 

down automata (dpda's), have been extensively studied in the literature (e.g., O [TOl [El 
[T6lll7j ). They are interesting not only from a theoretical point of view, but even, and per- 
haps mainly, for their relevance in connection with the implementation of efficient parsers. 
It is well-known that the class of deterministic context-free languages is a proper subclass 
of that of context-free languages, characterized by (nondeterministic) pushdown automata 
(pda's). In the case of languages defined over a one-letter alphabet, called unary or tally 
languages, these classes collapse: in fact, as proved in [6], each unary context-free lan- 
guage is regular. This implies that unary pda's and unary dpda's can be simulated by 
finite automata. 



*A preliminary version of this work was presented at the 13th International Conference on Implemen- 
tation and Application of Automata, CIAA 2008, San Francisco, USA, July 21-24, 2008. 

t Partially supported by MIUR under the project PRIN "Aspetti matematici e applicazioni emergenti 
degli automi e dei linguaggi formali: metodi probabilistici e combinatori in ambito di linguaggi formali". 



In this paper we study the simulation of unary dpda's by finite automata from a descrip- 
tional complexity point of view. As a main result, we get the cost, in terms of the sizes of 
the descriptions, of the optimal simulation between these kinds of devices. 

The problem of the simulation of dpda's by finite automata was previously studied in the 
literature in the case of general alphabets: in [16] it was proved that each dpda of size 
■s accepting a regular language can be simulated by a finite automaton with a number of 
states bounded by a function which is triply exponential in s. That bound was reduced 
to a double exponential in [17] . It cannot be further reduced because there is a matching 
lower bound [13] . 

We show that in the unary case the situation is different. In fact, we are able to prove 
that each unary dpda of size s can be simulated by a one-way deterministic automaton 
(Idfa) with a number of states exponential in s. We prove that this simulation is tight, by 
showing a family of languages exhibiting an exponential gap between the size of dpda's 
accepting them, and the number of states of equivalent Idfa's. 

As proved in [12], each n-state unary two-way nondeterministic finite automaton (2nfa) 
can be simulated by a Idfa with 2'^^^'^^°^"') states. This suggests the possibility of a 
smaller gap between the descriptional complexities of unary dpda's and 2nfa's. However, 
we show that even in this case the gap can be exponential. 

We further deepen the investigation in this subject, in order to discover whether or not 
for each unary regular language there exists an exponential gap between the sizes of 
deterministic pushdown automata and of finite automata. We give a negative answer 
to this question, by showing a family of languages for which unary dpda's cannot be 
exponentially more succinct than finite automata. 

In order to prove this last result, we study the problem of converting unary dpda's into 
equivalent context-free grammars. In general, given a pda with n states and m input sym- 
bols, the standard conversion technique produces an equivalent grammar with n^m + 1 
variables. As proved in [7], this number cannot be reduced, even if given pda is deter- 
ministic. Here, we show that in the case of a unary alphabet, a reduction to 2mn is 
possible. 

We briefly mention that the cost of the simulation of unary (nondeterministic) pda's by 
finite automata was studied in [14J, where the authors proved that each unary pda with 
n states and m stack symbols, such that each push adds exactly one symbol, can be 
simulated by a Idfa with 2'^("' ™ ) states. Our main result reduces this bound to 2"™, 
when the given pda is deterministic. 

2 Preliminaries 

Given a set S, we let #5" denote its cardinality, and 2"^ denote the family of all its subsets. 

A language L is said to be unary if it is defined over a one-letter alphabet. In this case, 
we let L a* . In a similar way, an automaton is unary if its input alphabet contains only 
one letter. It is easy to prove the following: 
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Theorem 1 Let L be a unary language. Then L is regular if and only if there exist two 
integers > 0, A > 1 such that for each integer n > jx, ^ L if and only if a"'~^^ G L. 

If the constant ^ in Theorem [T] is 0, then L is said to be cyclic or even \-cyclic. Fur- 
thermore, in this case, L is said to be properly X-cyclic, when it is not A'-cyclic for any 
A' < A. It is immediate to see that the minimum Idfa accepting a properly A-cyclic 
language consists of a cycle of A states. 

A pushdown automaton ^ M = (Q, S, F, 6, qo, Zq, F) is said to be deterministic [5\ if and 
only if for each q € Q, Z & T the following hold: 

1. if S(q, e, Z) 7^ then 6{q, a, Z) = 0, for each a € E, and 

2. for each o" G S U {e}, 6{q,a, Z) contains at most one element. 

A configuration of M is a triple {q, w, 7) where q is the current state, w the unread part of 
the input, and 7 the current content of the pushdown store. The leftmost symbol of 7 is 
the topmost stack symbol. As usual, we let h denote the relation between configurations 
such that for two configurations a and /?, a h /3 if and only if /? is reached from a in one 
move. We also write a \^ (3 if and only if /3 can be reached from a in t > moves, and 
(3 li and only ii a \^ (3 for some t > 0. 

While in the nondeterministic case acceptance by final states is equivalent to acceptance 
by empty stack, for dpda's the second condition is strictly weaker (dpda's accepting with 
empty stack characterize the class of deterministic context-free languages having the prefix 
property). Hence, the acceptance condition we will consider in the paper is that by final 
states. In particular, given a pda M, we will denote by L{M) the language accepted by it 
under such a condition, i.e., L{M) = {tf G S* | E F, 7 G F* : {qo,w, Zq) ^ (g, €,7)}. 

In order to simplify the exposition and the proofs of our results, in this paper it is useful 
to consider pda's in a certain normal form [l4] . 

1. At the start of the computation the pushdown store contains only the start symbol 
Zq] this symbol is never pushed on or popped off the stack; 

2. the input is accepted if and only if the automaton reaches a final state, and all the 
input has been scanned; 

3. if the automaton moves the input head, then no operations are performed on the 
stack; 

4. every push adds exactly one symbol on the stack. 

The transition function (5 of a pda M then can be written as 

5 : Q X (S U {e}) X F ^ 2Q^{{r'=^d,pop}u{push(A)|Aer})^ 

In particular, ior q, p G Q , A, B G T , a £ T, U {e}, {p, read) G 5{q, a, A) means that the pda 
M, in the state q, with A at the top of the stack, by consuming the input u G S or not 
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consuming any input symbol if cr = e, can reach the state p without changing the stack 
contents, {p, pop) S 6{q, e, A) ({p, push(i?)) € 6{q, e, A), resp.), means that M, in the state 
q, with A at the top of the stack, without reading any input symbol, can reach the state 
p by popping off the stack the symbol A on the top (by pushing the symbol B on the top 
of the stack, respectively). 

It can be easily observed that each pda can be converted into an equivalent pda satisfying 
these conditions. Furthermore, if the given pda is deterministic, then the resulting pda is 
deterministic too. Hence, in the following we will consider dpda's in the above form. 

Now, we have to introduce the measure for the size of pda's we will consider in the paper. 
The literature concerning this point is very restricted and probably a deeper investigation 
should be useful. The most extended discussion is presented in [8], where the author 
points out that the size of a pda M, denoted as size(M), should be defined by considering 
the total number of symbols needed to write down its description and, more precisely, the 
total number of symbols needed to specify its transition function. Converting a pda into 
normal form, the number of rules in the transition function of the resulting pda is linear 
in the length of the rules of the original pda, which, on the other hand, is bounded by 
some constant. Hence, the total number of symbols specifying the new pda is linear in the 
total number of symbols specifying the original pda. Because the size of a pda in normal 
form is linear in the number of rules of its transition function, and in the deterministic 
case this number is linear in the product of the number of its states and of the number of 
its stack symbols, in the paper we will use such a product as a "reasonable" measure for 
the size of a dpda in normal form. 

The size of a finite automaton is defined to be the number of its states. 

A mode of a pda M is a pair belonging to Q x T. In the paper, the mode defined by a 
state q and a symbol Z will be denoted as [qZ]. The mode of the configuration {q,x, Za) 
is [qZ]. Note that in a unary dpda, the mode of a configuration defines the only possible 
move. 

A dpda M is loop-free if and only if for each u) G S* there are q € Q, ^ £T* , Z such 
that {qo,w,ZQ) {q,e,Z'y) and S{q,e,Z) = 0, i.e., for each input string the computation 
cannot enter in an infinite loop of e-moves. It is known that each dpda can be converted 
into an equivalent loop-free dpda [5]. In the unary case such a conversion can be done 
without increasing the size of the given dpda. In fact, we can write a procedure that given 
a mode [qA] simulates the e-moves of M in order to make a list of the modes reachable 
from the configuration (q,e,A). If a mode is visited twice, then the computation enters a 
loop. In this case, the transition function of M can be modified by setting 6{p, €,B) = 
for each mode visited in the simulation. Note that the procedure ends before size(M) 
steps. Hence, in the following, without loss of generality, we will suppose that each unary 
dpda we consider is loop-free. 
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3 Simulation of unary dpda's by finite automata 



In this section we prove our main result: in fact we show that each unary dpda M can be 
simulated by a Idfa whose number of states is exponential in the size of M. We will also 
show that this simulation is tight. 

Let us consider a given unary dpda M. We start by introducing some useful notions and 
lemmas: 

Definition: Given two modes [qA] and \pB], we define [qA] < \pB] if and only if there 
are integers k,h >0 and strings a,/3 G F*, such that: 

• {qo, a'', Zo) ^ (q, e, Aa), {q, a'', A) \^ (p, e, Bf3), and 

• if (qo, a^' , Zq) (p, e, Bf3') for some k' < fc, /?' G F*, then there is an integer k" with 
k' + k" <k and a state p' G Q, such that (p, ,B)^ {p', e, e). 

Intuitively, [qA] < [pB] means that M from the initial configuration can reach a config- 
uration with mode [qA] by a computation (qo, , Zq) \^ {q,e,Aa) and, after that, it can 
reach a configuration with mode [pB] by a computation which does not use the portion of 
the stack below A, i.e., the portion containing a. Furthermore, if during the computation 
(qo, a'', Zq) \^ {q, e, Aa) a configuration with mode [pB] and stack height h is reached, then 
in some subsequent step of the same computation the stack height must decrease below 
height h. In other words, for all integers k' and k" with k' + k" = k, it is not possible that 
{qo, a!"' ,Zo) ^ {p, e, Bp') and {p, a^" ,B) ^ [q, e, Aa'), for some a', /?' G F*. 

Lemma 1 The relation < defines a partial order on the set of the modes. 

Proof: Clearly, the relation < is reflexive. To prove that it is antisymmetric, we consider 
two modes [qA] and [pB] and we show that [qA] < [pB] and [pB] < [qA] imply [qA] = [pB] . 

By definition of <, for suitable integers k,h,s,t, and strings a, /3, 77,7 G F*, we have: 

(a) {qo,a'',Zo) ^ (q,e,Aa), 

(b) iq,a\A)^{p,e,Bf3), 

(c) (go,a^Zo) {p,e,Bj), 

(d) {p,a',B)^iq,e,An). 

Considering (b) and (d), we can observe that when M reaches a configuration with the 
mode [qA] {[pB], respectively), the symbol A [B, resp.) will never be popped off the stack, 
i.e.: 

(e) for each n > 0, there are q' ,p' G Q, a' , (3' G F* such that: (g, a". A) 1* (q' , e, a' A) and 
{p,a-,B)^{p',e,P'B). 
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We now suppose that s ^ k. li s <k then from (c) and (a) we get: 
(f) (go, a^ ^o) ^ (P, a'=-^ B7) ^ (g, e, Aa). 



By the definition of <, this implies the existence of an integer I with s + l < k and a state 
p" such that {p,a\B) ^ {p",e,e), which is a contradiction to (e). In a symmetrical way, 
by supposing k < s, we get a contradiction. 

This permits us to conclude that s = k and hence that [qA] = \pB] . 

We now prove that < is transitive. To this aim we suppose that [qA\ < [pB] and [pB] < 
[rC] and we show that [qA] < [rC]. If \pB] = [rC] then the result is trivial. Hence, from 
now on, we suppose \pB] 7^ [rC] . 

We consider integers k,h,s,t > and strings a, /3, 77, 7 G F* such that: 

(a) (go, 0^,-^0) ^ {q,^,Aa), 

(b) iq,a\A)^{p,e,B(3), 

(c) (go,a^Zo) {p,e,Br]), 

(d) ip,a\B)\^{r,e,Cj). 

Prom (b) and (d) we get: 

(e) iq,a^+\A)\^{r,e,Cj(i). 

Suppose, by contradiction, that [qA] < [rC] does not hold. Considering the definition of 
< , (a) and (e) , it turns out that it must exist two integers ki and k2 with k\ + k2 = k such 
that: 

(f) (go,a'=i,Zo)H^(r,e,C7i) and 

(g) (r,a'=^C)^(g,e,A72) 

with a = 7271. Prom (g) and (b) we get: 

(h) (r,a'=2+^C) F^(p, €,5/372) 

Because [rC] 7^ [pB] and \pB] < [rC], it turns out that [rC] < \pB] cannot hold. Con- 
sidering (f) and (h) this implies the existence of two integers k' and k" with k' + k" = ki 
such that 

(i) (go,a^',Zo)F^(p,6,5y) 
(j) {p,a>'",B)^ir,e,Cj") 

with 7" 7' = 71. Hence: 
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(k) {p,a''"+^'^,B)\^{q,e,Aj27") 

But this, together with (i), gives a contradiction to the hypothesis that [qA] < [pB]. 
Hence, we are finally able to conclude that [qA] < [rC]. 

□ 

A configuration completely describes the status of a pda in a given instant and gives enough 
information to simulate the remaining steps of a computation. However, in order to study 
the properties of the computations of dpda's, it is useful to have a richer description, which 
also takes into account the states reached in some previous computation steps. To this aim 
we now introduce the notion of history. Before doing that, we observe that the next move 
from a configuration of a unary dpda depends only on the current mode. If such a move 
requires the reading of an input symbol and all the input has been consumed, then the 
computation stops. Hence, given a unary dpda M, for each integer t there exists at most 
one configuration that can be reached after t computation steps. Such a configuration will 
be reached if the input is long enough. 

Definition: For each integer t > 0, the history hf of M at the time t is a sequence of 
modes [qmZm][qm^iZm~i] ■ ■ ■ [qiZi] such that: 

• ZmZm-i • • • Z\ is the content of the stack after the execution of t transitions from 
the initial configuration, 

• for each integer i, 1 < i < m, [qiZi] was the mode of the last configuration having 
stack height i, in the computation (go, x, Zq) {q^, e, ZmZm-i ■ ■ ■ Zi), for a suitable 
x € a . 

The mode at the time t, denoted as m^, is the leftmost symbol of ht, i.e., the pair repre- 
senting the state and the stack top of M after t transitions In what follows we let H 

denote the set of all histories of M, i.e., H = {ht \ t > 0}. 

Lemma 2 Let ht = [qmZm][Qm-iZm-i] • • • [^I'^i] be the history at the time t, for a given 
t > 0. Then: 

1. For i = l,...,m — 1, there is an integer ti s.t. ht- = [gjZj] • • • [giZi], 
{qi,x, Zi) ^ {qi+i, e, Zi^iZi), for some x ^ a* , and ht^ is a suffix of each hj, for each 
integer j such that ti < j < m. Furthermore < ti < t2 < ■ ■ ■ < tm-i < t. 

2. If all the modes in ht are different then [qiZi] < ■ ■ ■ < [qmZm]- 

3. If hfj, = h^j^x for some /x > 0, A > 1, then hfj_+i = h^^x^i, for each i > 0. 

Proof: For each i, 1 < i < m, let tj > be the largest integer such that |/if. | = i. (Note 
that tm = t.) 

^ Because the start symbol Zq is never popped off the stack, actually we can observe that in each history 
the symbol Zi of the rightmost mode coincides with Zo- 
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Hence, the stack height at each step j, ti < j < m, must be greater than i. This 
imphes that the first i symbols on the stack cannot be modified after step ti, i.e., /if. = 
[qiZi] ■ ■ ■ [qiZi], and, in the case i < m, {qi,x,Zi) ^ (gj+i, e, Zj+iZj), for some input x. 
Hence, ([1]) easily follows. 

To prove 1^, we also observe that (50,0*^,-^0) ^ {li, ^, Z-iZi^i ■ ■ ■ Zi), for some k > 0. 
Suppose that [qiZi] < [^j+iZj+i] is not true. Hence, {qo^a^ , Zq) \^ (gj+i, e, Zj+17') and 
{qi+i,a^", Zi+i) (g,, e, Za"Zi+i) for some k', k" , with k' + k" = k and 7', 7" € T* . Thus, 
Zj7"Zj+i7' = Zi ■ ■ ■ Zi and [^j+iZj+i] = [qjZj] for some j < i, which is a contradiction to 
the initial hypothesis that ht does not contain any repetition. 

Hence, we get that [qiZi] < [(/j+iZj+i] and ([2|) follows by Lemma [TJ 

To prove ([3]), we observe that = /i^+a implies that the configurations reached at the 
steps /U and jU + A coincide. Since M is unary and deterministic, it immediately follows 
that for each i > at steps fj. + i and /i + A + i the same move is performed. Hence, 



Lemma 3 The set H contains infinitely many histories if and only if there exist two 
integers /i > 0, A > 1, and A nonempty sequences of modes hi, ... , h\, such that 

h^+i = hih^, /i|j+2 = h2h^, . . . , h^^k\+i = hi{h\)'^h^, 

for all integers A; > 0, < i < A. 

Furthermore, if such fi and A exist then their sum does not exceed 2"^^"^^, while if H is 
finite then its cardinality is less than 2*'^'*'". 

Proof: Suppose that H contains infinitely many elements, and consider the smallest 
index t such that the history ht = [qmZm] ■ ■ ■ [liZi] contains a repetition. In the light of 
Lemma [2]|T]), the mode [qmZm] must be repeated in ht, namely there is an index i, < i < 
m, such that [qmZm] = [qiZi], an integer fi, 1 < fi < t, such that /i^ = [qiZi] ■ ■ ■ [qiZi], and 
some sequences hi, ... , h\, where \ = t — ^, such that /i^+i = hihfj_, . . . , /i^+a = h\h^. 
Note that the sequences hi cannot be empty (otherwise, by Lemma[2]l3]), H cannot contain 
infinitely many elements). Because the transitions after time fi depend only on the mode 
[qiZi] and on the modes in the sequences hi,. . . , h\, and the mode at the time ^ + \ = t 
is [qiZi], then it is not difficult to conclude that /i^+a+i = hih\h^^, /i^+a+2 = ^2^a^/x, 
. . . hf,+kx+i = hi{h\Yh^, for A; > 0, < i < A. 

The converse is trivial. 

Finally, we observe that, by Lemma [2)|2l) , the sets of modes belonging to two different 
histories ht and ht' not containing any repetition must be different. This implies that 
the number of histories without repetitions does not exceed the number of all possible 
nonempty sets of modes, i.e., it is at most 2#'3-#r _ \ Hence, if the history h2#Q-#r does 
not contain any repetition, then it coincides with some history ht, for a t < 2*'3'#^. By 
Lemma EljS]) this implies that H is finite. □ 
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Lemma 4 The sequence {mt)t>o is ultimately periodic. More precisely, there are integers 
> 0, A > 1 such that ij. + X < 2#'3#^ and mt = rnt+\, for each t> fx. 

Proof: By Lemma [2)|3]) , if H is finite then {ht)t>o is ultimately periodic, and hence even 
("^t)t>o is ultimately periodic. Note that, as a consequence of Lemma El in this case the 
set H cannot contain more than 2*'^*'" — 1 elements. This gives the upper bounds on 
/i + A. 

If H is infinite then the sequence of histories {ht)t>o is not periodic. However, the sequence 
of modes {mt)t>o is defined by the leftmost symbols of {ht)t>o- Hence, by Lemma[3l it is 
periodic, with ^ + A < 2*^^*^ . □ 

Now, we are ready to prove our main result: 

Theorem 2 Let L a* be accepted by a dpda M in normal form with n states and m 
stack symbols. Then L is accepted by a Idfa with at most 2™"" states. 

Proof: The acceptance or rejection of a word depends only on the states that are reached 
by consuming it (and possibly performing some e- moves). By Lemma U] the sequence of 
the modes that can be reached in computation steps is ultimately periodic. This implies 
that also the sequence of the reached states, which gives the acceptance or the rejection, 
is ultimately periodic. Hence, it is possible to build a Idfa accepting the language. The 
upper bound on the number of the states derives from Lemma [H □ 

As a consequence of Theorem [2l each unary dpda M of size s can be simulated by a Idfa 
with a number of states exponential in s. We now prove that such a simulation is optimal. 
In particular, we show that for each integer s there exists a language which is accepted by 
a dpda of size 0{s) such that any equivalent Idfa needs 2* states. 

More precisely, for each integer s, we consider the set of the multiples of 2*, written in 
unary notation, namely the language Ls = {a^"}*. 

Given s > 0, we can build a dpda accepting Lg that, from the initial configuration, 
reaches a configuration with the state go and the pushdown containing only Zq, every 
time it consumes an input factor of length 2**, i.e., {qQ,a^\ZQ) ((7o,e,-^o)- The state 
go is the only final state and it cannot be reached in the other steps of the computation. 
The computation from {qQ^a^" ,Zq) to {qQ^e^Zo) uses a procedure that, given an integer 
z, consumes 2* input symbols. For i > the procedure makes two recursive calls, each 
one of them consuming 2*~^ symbols. In the implementation, two stack symbols j4j_i and 
Bi^i are used, respectively, to keep track of the first and of the second recursive call of the 
procedure. For example, for s = 3, a configuration with the pushdown store containing 
BqAiB2Zq will be reached after consuming 2^ + 2*^ input symbols and performing some 
e-moves. The formal definition is below: 

• Q = {go, 91, 92,^3} 

• r = {Zq, Ao, Ai, . . . , As^i,Bo, Bi, . . . , i?s-i} 
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6{qo,e,Zo) = {{qi,push{As-i))} 
5{qi,a,Ao) = {(53, read)} 
6{qi,a,Bo) = {(g3,read)} 

6{qi,e,Ai) = 5{qi,e,Bi) = {(gi, push(yli_i))}, for z = 1, . . . , s - 1 
6{q2,e,Ai) = 5{q2,e,Bi) = {(gi, push(Si_i))}, for i = 1, . . . , s - 1 
6{q3, e, Ai) = {(^2, pop)}, for i = 0, . . . , s - 1 
(5(^3, e, Bi) = {(^3, pop)}, for i = 0, . . . , s - 1 
6{q2,e,Zo) = {(gi, push(B^_i))} 
S{q3,(^:Zo) = {{qo,Zo)} 

F = {go}. 



Theorem 3 For each integer s > 0, the language Lg is accepted by a dpda of size 8s + 4 
but the minumum Idfa accepting it contains exactly 2* states. 



Proof: First, we prove by induction on i = 0, . . . , s — 1, that {qi,a'^\ Ai) ^ ((72, e, e) and 
(gi,a^ , Bi) ^ (g3,e,e). The basis, i = 0, is trivial. For i > the computations, obtained 
using the induction hypothesis, are the following, where the symbol C can be replaced by 
Ai and by Bi: 

{qi,a'\C) h {qi,a^\Ai.,C) ^ {q2,a^''\C) h [q^, ,a'''\ B^.^C) ^ {q3,e,C). 
and the last step is (g3,e, Aj) h {q2,e,e) or {q3,e,Bi) h (g3,e,e). 

As a consequence, the dpda of size 8s + 4 defined above recognizes Lg- Because Lg is 
properly 2*-cyclic, the minimum Idfa accepting it has 2* states. □ 

Using Theorem 9 of [11], it is possible to prove that also any 2nfa accepting the language 
Lg must have at least 2* states. Hence we get the following: 

Corollary: Unary determistic pushdown automata can be exponentially more succinct 
than two-way nondeterministic finite automata. 



4 Unary dpda's and context-free grammars 



In this section we study the conversion of unary dpda's into context-free grammars. Given 
a pda with n states and m stack symbols, the standard conversion produces a context-free 
grammar with n'^m + 1 variables. In [7] it has been proved that such a number cannot be 
reduced, even if the given pda is deterministic. As we prove in this section, in the unary 
case the situation is different. In fact, we show how to get a grammar with 2nm variables. 
This transformation will be useful in the last part of the paper to prove the existence of 
languages for which dpda's cannot be exponentially more succinct than Idfa's. 

Let M = {Q, {a}, r, 6, qo, Zq,F) be a unary dpda in normal form. 

First of all, we observe that for each mode [^^4] there exists at most one state p such 
that {q,x,A) ^ (p, e,e) for some x G a*. We denote such a state by exit [g A] and we call 
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the sequence of moves from {q,x,A) to (p, e, e), the segment of computation from [qA]. 
Note that given two modes [qA] and [q'A], if {q,x,A) {q',e,A), for some x G a*, then 
exit[g^] = exit[gM]. 

We now define a grammar G = {V, {a}, P, S) and we wih show that it is equivalent to M. 
The set of variables is V = Q x T x {0, 1}. The elements of V will be denoted as [gAJh, 
where [qA] is a mode and b € {0, 1}. The start symbol of the grammar is S" = [go-^o]i- 

The productions of G are defined in order to derive from each variable [qA]Q the string x 
consumed in the segment of computation from [^^4], and from each variable [qA]i all the 
strings x such that M, from a configuration with mode [^^4] can reach a final configura- 
tion, consuming x, before completing the segment from [qA]. They are listed below, by 
considering the possible moves of M: 

• Push moves: For d{q,e,A) = {{p,pnsh.{B))}, there is the production 

(a) [qA]i ^ [pB]i 

Furthermore, if exit[p-B] is defined, with exit [pi?] = q' , then there are the productions 

(b) [qA]o ^ [pB]o[q'A]o 

(c) [qA], ^ [pB]o[q'A], 

• Pop moves: For 6{q,e,A) = {(^J, pop)}, there is the production 

(d) [qA]o - e 

• Read moves: For 6{q,a,A) = {(p, read)}, with a € {e,a}, and for each b € {0,1}, 
there is the production 

(e) [qA]b^a[pA]b 

• Acceptance: For each final state q & F, there is the production 

(f) [qA], - e 

The productions from a variable [qA]o are similar to those used in the standard conversion 
from pda's (accepting by empty stack) to context-free grammarsH The productions from 
modes [qA]i are used to guess that at some point the computation will stop in a final state. 
For example, for the push move {p, push(S)) G 6{q, e. A), we can guess that the acceptance 
will be reached in the segment of computation which starts from the mode [pB] (hence, 
ending the computation before reaching the same stack level as in the starting mode [qA], 
see production (a)), or after that segment is completed (production (c)). 

In order to show that the grammar G is equivalent to M, it is useful to prove the following 
lemma: 

Lemma 5 For each mode [qA], x £ a* , the following hold: 

^In that case, variables of the form [qAp] are used, where p represents one possible "exit" from the 
segment from [qA]. In the case under consideration, there is at most one possible exit, namely exit[g^]. 
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1. [qA\Q ^ X if and only if {q, x, A) ^ {exit[qA], e, e). 

2. [qA]i ^ X if and only if {q, x, A) }r (q' , e, 7), for some q' E F, j E. F"*". 

k 

Proof: To prove (1), we show by induction that for each integer k > 1, [qA]o => x if and 
only if {q,x,A) \^ (exit [qA] , e, e) . 

First of all, we observe that the case k = 1, which corresponds to productions (d) and to 
pop moves, is trivial. For the inductive step, we consider three subcases, depending on 
the move allowed from the mode [qA] . 

. 6{q,e,A) = {{p,pusHB))}: 

k f k' 

Let q = exit\pB] and suppose that [qA]Q =^ x. Then, [qA]Q =^ [pi?]o[(7 A]o, \pB]o =^ 

k" 

x', [q'A](j =^ x", for some k',k" > 0, x', x" such that k' + k" = k — 1 and x'x" = x. By 
the induction hypothesis {p,x',B) (g',e, e) and {q',x",A) \^ {exit[q'A],e,e). As 
observed above, exit[(/'A] coincides with exit[(/yl]. Hence: {q,x,A) h {p,x'x" , BA) \^ 
{q',x",A) h (exit[gA], e, e), that implies {q,x,A) h (exit[g^], e, e). In a similar way, 
the converse can be proved. 

• S{q,e,A) = {(p, pop)}: impossible for k > 1. 

• 6{q,a,A) = {(p, read)}, with a G {a,e}: 

By production (e), [qA]o =^ (7[pA]o. Furthermore, {q,a,A) h {p,e,A). By the induc- 
tion hypothesis, for each terminal string y, [pA]o y if and only if {p,y,A) l^"^ 
(exit [p A], e, A). The proof can be easily completed, by choosing y such that x = ay, 
and by observing that exit[p74] must coincide with exit [g A]. 

(2) Let us start by proving the "only if" part, by induction on the length k of the derivation 

[qA] i 4 X. 

For the basis, k = 1, the derivation must consists only of a production of the form (f). 
This implies that q E F. Hence the corresponding computation is trivial and consists only 
of the configuration {q,e,A). For A; > 1 we consider different subcases, depending on the 
first used production: 

• Production (a), namely [qA]i \pB]i, with 6{q,e,A) = {{p,push.{B))}: 

\pB]i ^ X and, by inductive hypothesis {p,x,B) h {q',e,j), for some q' G F, 
7 G r+. Hence: {q,x,A) h {p,x,BA) ^ {q',e,^A). 

• Production (c), namely [qA]i — > [pS]o[g'^]i, with q' = exit[pB] and 5{q,e,A) = 
{(p,push(S))}: 

[pB]o 4 x' , [q'A]i 4 x" , with x'x" = x, k' + k" = k — 1. From (1) wc get that 
{p,x',B) \^ {q',e,e) and, from the inductive hypothesis, {q',x",A) ^ (^",€,7), with 
q" G F, 7 G r+. Hence: {q,x,A) h {p,x'x",BA) ^ {q',x",A) ^ (^",€,7). 
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• Production (e), namely [qA]i a[pA]i, with a € {a,e}, x = ay, and 6{q,a,A) = 
{(p,read)}: 

[pA]i =^ y and, by inductive hypothesis, {p,y,A) h (g',e, 7), for some g' G F, 
7 G r+. Hence: (g,x,^) h (g',e,7). 

We now prove the "if" part, by induction of the number k of moves in a computation 
{q,x,A) ^ ((/',£, 7), with G F, 7 G r+. 

If = then q = q' and x = e. The trivial computation is simulated by the derivation 
consisting only of the production (f). 

For > 0, we consider different subcases, depending on the first move of the automaton: 

. 6iq,e,A) = {{p,push{B))}: 

{q, X, A) h (p, X, BA) h {q', e, 7). Because 7 is not empty, during the given compu- 
tation the symbol A cannot be removed from the stack. Hence 7 = 7'A, for some 
V G r*, and {p,x,B) ^"^ {q',e,i). 

If 7' = e then q' = exit [pi?] and, by (1), \pB]o ^ x. Hence [qA]i =^ [pB]o[q'A]i ^ 
x[q'A]i ^ X (since q' G F, in the last step the production (f) is used). 

On the other hand, if 7' 7^ e, then by the inductive hypothesis, it turns out that 
[pB]i =^ X. Hence, using production (a), [qA]i =^ \pB]i =^ x. 

• ^{q,(^,A) = {(p,pop)}: 

This case is not possible because it should imply k = 1, x = e, p G F, and 7 empty. 

• 6{q,a,A) = {(p, read)}, with a G {a,e}, x = ay, y G a*: 

{q,ay,A) h {p,y,A) (q',e,^). By inductive hypothesis [pA] ^ y. Hence: 
[qA]i =^ a[pA]i ay = x. 

□ 

As a consequence of Lemma El it turns out that, for each x G a*, [go-^o]i ^ x if and only 
if X is accepted by M. Hence, we get the following result: 

Theorem 4 For any unary deterministic pushdown automaton M in normal form, with 
n states and m pushdown symbols, there exists an equivalent context-free grammar with 
at most 2mn variables, such that the right hand side of each production contains at most 
two symbols. 

Finally, we can observe that from the grammar G above defined, it is easy to get a grammar 
in Chomsky normal formal, accepting L{M) — {e}. This can require one more variable. 

5 Immediate acceptance/rejection 

Because dpda's can perform e-moves, in order to decide whether or not an input string w 
is accepted, it is not enough to consider only the configuration reached immediately after 
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reading the last symbol of w: even the configurations reachable in the further steps, via 
e-moves, must be taken into account. In this section we show how to modify a unary pda, 
accepting by final states, in order to be able to decide the acceptance or the rejection of 
an input string w, just considering the configuration reached immediately after readin^ 
the last symbol of w. This result will be useful for a construction presented in Section 
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More precisely, let us consider a unary (deterministic or nondeterministic) pda M = 
{Q , {a} ,T , 6, qo , Zq, F) in normal form, accepting by final states. We define another pda 
M' , where each transition (p, read) € 6{q,a,A) of M is replaced with an e-transition, 
postponing the reading of the symbol a until a final state is reached or the following input 
symbol should be read. 

More formally, M' = {Q' ,{a},T,6' ,qQ, Zq, F'), with Q' = Q U Q U {q'^}, where Q is an 
isomorphic copy of Q and the transition function 6' is defined as follows, ioic q G Q, q G Q, 
a G {e,a}, AeT: 

• 5'{q, e. A) = 5{q, e, A) U {{p, read) j (p, read) G 5{q, a, A)} 

• 6'{q,a,A) =0 

r {(p,a) I ip,a) G 5{q,a,A)} ii q i F 

• 5'{q, a,A) = l {{q, read)} if g G F and a = a 

otherwise 

• '5'(go,e,Zo) = {(go, read)} 

Intuitively, the states in Q are used to remember the debt of one read operation. The 
debt is paid when a final state is reached. However, if in the original pda M the read of 
a further symbol must be performed, before reaching a final state, then in M' a read is 
executed, without canceling the debt. 

The new initial state q^ is useful when go is not accepting, but the empty word must be 
accepted, i.e., in the original automaton there is a sequence of transitions leading from go 
to a final state, without consuming any input symbol. Hence: 



pi _ { F \J {q'o} if e is accepted by M 
y F otherwise. 



Because final states (with the possible exception of ^q) can be reached only with moves 
that consume an input symbol, we can conclude that M' satisfies the required property 
of accepting input strings immediately after reading the last symbol. In order to prove 
that M' is equivalent to M, the following lemma is useful (the transition relations between 
configurations are marked with the names of the considered pda's): 



Lemma 6 For each k > 0, q £ Q, a G F* : (a) (goi^'^j-^o) {Q,^,Oi) if md only if 
(h) {qQ,a^,Zo) \j.f' (g,e,Q) or (c) {q'^^a^^^ , Zq) li^/ (g,e,a). Furthermore, if {qo, a'' , Zq) 
(p, e, 13) ^1 (g, e, a), for some p £ F, /? G F*, then (b) holds. 

^We remind that as observed in Section [21 in the unary case we can consider, without increasing the 
size, loop-free dpda's. 
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Proof: The lemma can be proved by induction on the length of the derivations, and 
by observing that for q £ F, (c) implies (b). Because the proof is very technical and it 
involves only standard arguments, it is omitted. 



As consequence of the previous construction and of Lemma El we get that M and M' are 
equivalent, and hence: 

Theorem 5 For each unary pda M in normal form with n states, accepting by final states, 
there exists an equivalent pda M' in normal form with 2n+l states and the same pushdown 
alphabet as M such that each input string w is accepted if and only if the state reached 
immediately after reading the last symbol of w is final. Furthermore, if M is deterministic 
then M' is deterministic, too. 

6 Languages with complex dpda's 

In Section [3l we proved that dpda's can be exponentially more succinct than finite au- 
tomata. In this section we show the existence of languages for which this dramatic reduc- 
tion of the descriptional complexity cannot be achieved. More precisely, we prove that for 
each integer m there exists a unary 2"^-cyclic language Bm such that the size of each dpda 
accepting it is exponential in m. 

Let us start by introducing the definition of the language Bm- To this aim, we first recall 
that a de Bruijn word [3j of order m on {0, 1} is a word Wm of length 2™ -|- m — 1 such that 
each string of length m is a factor of Wm occurring in Wm exactly one time. Furthermore, 
the suffix and the prefix of length m — 1 of Wm coincide. 

We consider the following language^ 



For example, W3 = 0001011100 and Bs = {a^,a'^,a^,a'^}{a^}* . 

By definition and by the above mentioned properties of de Bruijn words, B^ is a properly 
2"^-cyclic unary language. Hence, the minimal Idfa accepting it has exactly 2"^ states 
(actually, by Theorem 9 in [11], this number of states is required even by each 2nfa 
accepting Bm)- We show that even the size of each dpda accepting Bm must be exponential 
in m. More precisely: 

Theorem 6 There is a constant d, such that for each m > the size of any dpda accepting 
Bm IS at least d— r. 

Proof: Let us consider a dpda M of size s accepting Bm- We will show that from M it is 
possible to build a grammar with 0{sm) variables generating the language which consists 

*The same language was considered in [5] for a different problem. 



□ 



Bm = {a^ I the (/cmod'2'^)th letter of Wra is 1}, 




X mod y if x mod y > 
y otherwise. 
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only of the word Wm- Hence, the result will follow from a lower bound presented in [1], 
related to the generation of Wm- 

First of all, by Theorem [5l from M it is possible to get an equivalent dpda M' of size 
0(s), such that M' is able to accept or reject each string immediately after reading the 
kth. letter of the input. 

We also consider a Idfa A accepting the language L which consists of all strings x on the 
alphabet {0, 1}, such that x = yw, where w is the suffix of length m of Wm, and w is not 
a proper factor of x, i.e., x = x'w, and x = x"ww' implies w' = e. Note that A can be 
implemented with m + 1 states. The automaton A will be used in the following to modify 
the control of M', in order to force it to accept only the string a'^"^~^^~^. 

To this aim, we describe a new dpda M" . Each state of M" simulates one state of M' 
and one state of A. The initial state of M" is the pair of the initial states of M' and A. 
M" simulates M' moves step by step. When a transition which reads an input symbol 
is simulated, then M" simulates also one move of A on input a € {0, 1}, where cr = 1 if 
the transition of M' leads to an accepting state, otherwise. In this way, the automaton 
A will finally receive as input the word Wm- When the simulation reaches the accepting 
state of A, namely the end of Wm has been reached, M" stops and accepts. Thus, the only 
string accepted by M" is a^™"'"'""^. 

Using the construction presented in Section [H we can build a context-free grammar G 
equivalent to M" . We modify the productions of G that correspond to operations which 
consume input symbols: each production [gvl];, is replaced by Ib^lfe if P 

corresponds to a final state of M', and by [^^4];, ^\pA\b otherwise. It is easy to observe 
that the grammar G' so obtained generates the language {wm}- Furthermore, the size of 
G' is bounded by ksm, for some constant k. By a result presented in [3] (based on a lower 
bound from 111), the number of variables of G' must be at least c— for some constant c. 
Hence, from ksm > c^, we finally get that the size of the original dpda M must be at 
least for some constant d. □ 
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